A Multi-Layered Annotated Corpus of Scientific Papers
نویسندگان
چکیده
Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary.To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.
منابع مشابه
Multi-label Annotation in Scientific Articles - The Multi-label Cancer Risk Assessment Corpus
With the constant growth of the scientific literature, automated processes to enable access to its contents are increasingly in demand. Several functional discourse annotation schemes have been proposed to facilitate information extraction and summarisation from scientific articles, the most well known being argumentative zoning. Core Scientific concepts (CoreSC) is a three layered fine-grained...
متن کاملCorpus for Coreference Resolution on Scientific Papers
The ever-growing number of published scientific papers prompts the need for automatic knowledge extraction to help scientists keep up with the state-of-the-art in their respective fields. To construct a good knowledge extraction system, annotated corpora in the scientific domain are required to train machine learning models. As described in this paper, we have constructed an annotated corpus fo...
متن کاملSciCorp: A Corpus of English Scientific Articles Annotated for Information Status Analysis
This paper presents SciCorp, a corpus of full-text English scientific papers of two disciplines, genetics and computational linguistics. The corpus comprises co-reference and bridging information as well as information status labels. Since SciCorp is annotated with both labels and the respective co-referent and bridging links, we believe it is a valuable resource for NLP researchers working on ...
متن کاملA Tagged Corpus for Automatic Labeling of Disabilities in Medical Scientific Papers
This paper presents the creation of a corpus of labeled disabilities in scientific papers. The identification of medical concepts in documents and, especially, the identification of disabilities, is a complex task mainly due to the variety of expressions that can make reference to the same problem. Currently there is not a set of documents manually annotated with disabilities with which to eval...
متن کاملAdding multi-layer semantics to the Greek Dependency Treebank
In this paper we give an overview of the approach adopted to add a layer of semantic information to the Greek Dependency Treebank [GDT]. Our ultimate goal is to come up with a large corpus, reliably annotated with rich semantic structures. To this end, a corpus has been compiled encompassing various data sources and domains. This collection has been preprocessed, annotated and validated on the ...
متن کامل